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1 . 



INTRODUCTION 



In a previous report (Reference 1) , the author proposed 
the use of sequential (successive) differences as an aid in 
identifying outlier data points and in selecting the appropriate 
order polynomial for smoothing of 3-D data on torpedo and target 
paths. In this report, the concept of successive differences 
is explored and developed with the specific intent of making it 
suitable for inclusion in a computer program for smoothing 
3-D data. 

The nature of the report is in the form of a working 
paper rather than a polished formal report. Some of the dis- 
cussions presented are rather lengthy and points of interest 
are, perhaps, belabored and/or repeated unnecessarily. The 
reader's indulgence is invited and some skimming is expected. 
Nevertheless the general picture appears clear and the possi- 
bility of using the model for identification of outliers 
reasonable . 
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2 . 



DEVELOPMENT OF MODEL 



A. General Considerations 

For the purposes of this analysis, it will be assumed 
that an observed datum can be expressed in the form 

X. = x(t.) = P (t.) + n. + d. 

1 1 XI 1 1 

where P ( t) is a polynomial in time t, n^ is a measurement 

error which will be called "noise," and d^ is a perturbation 

or disturbance which, if present with sufficient amplitude, 

will cause x. to be a "wild" datum or outlier. 

1 

It will be assumed that each com.ponent (x,y,z) of a 
torpedo (T) or target (submarine, S) path can be represented 
as a polynomial of some low degree k in time t. (It is 
suggested that the restriction k £ 4 be incorporated in the 
smoothing algorithm.) Thus 

Px(t) = aQ + a^t + a2t + ••• + aj^t . 

The noise component, n^, is assumed to be a realization 

of a random variable which is Normally distributed with 

2 2 

mean 0 and coirmon variance a (N^ ~ N(0,a )) and it is also 

assumed that noise components N. and N. at times t. and 

1 J 1 

tj are independent. 
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Finally, it will be assumed that a disturbance d^ 
should have fairly rare occurrence. Evidence of the existence 
of a non- zero value of d^ can be obtained from examination 
of successive differences which, when sufficiently high order 
differences are considered, are functions of the (n^ + d^)'s 
and not of the P(t^)'s. Crossing of a threshold value for 
successive differences, which is seldom crossed when no d^'s 
are present, can then be used as an indication of the presence 
of a disturbance d^ and hence of an outlier point. Note 
that, not only can noise only cause an occasional crossing 
depending on the threshold selected, but the presence of a 
disturbance may not cause a threshold crossing depending on 
its magnitude and its interaction with noise. This will be 
elaborated as the development of the model progresses. 



B . Successive Differences 

A definition of successive or sequential differences 
suitable for our purposes is presented in the accompanying 
table (Table 1) and the notation which follov/s . Since the 
3-D data to be smoothed involves data points equally spaced 
in time, this has been incorporated in the model. Further, 
the initial time for any data segment can be arbitrarily set 
to zero for model development hence t^ = 0 . Also, selection 
of the common time interval as the unit of time yields 
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SUCCESSIVE DIFFERENCES 




4 



The selection of the secondary subscript i in the 
ordered differences is somewhat arbitrary. As will be noted when 
disturbances are introduced, it appears desirable for computa- 
tional convenience to identify the even ordered differences 
(D 2 i and with the observation for each i. For 

example, a large isolated disturbance d^ in will produce 

large perturbations in D 2 ^ and hence the latter can 

be used to identifv x. as an 'outlier.' For the odd ordered 

* 1 

differences and the situation is not as clear. 

For example, if a large perturbation is observed in it 

is not clearly evident whether x. or x. , should be con- 
sidered as the 'outlier.' At this stage in the development, 
it would appear that the even ordered successive differences 
should be the primary identifiers of 'outliers.' 



C . The Polynomial Component 

To illustrate the contribution of the polynomial component 
to successive differences, three cases (linear, quadratic, and 
cubic) polynomials are presented in Tables 2.1, 2.2 and 2.3.. 

It can readily be seen that there is a contribution of a 
polynomial of degree k to for j £ k but that for j > k 

the number represents noise only unless a disturbance 

is present. Thus detection of a disturbance, and hence identi- 
fication of an outlier, becomes simpler if a sufficiently high 
order difference can be used and the polynomial component 
eliminated . 
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TABLE 2.1 



SUCCESSIVE DIFFERENCES 



Linear Case: x. = x{t.) = a_ + a.t. + n. 

1 1 0 1 1 1 
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TABLE 2.2 



2 

Quadratic Case: x. = x(t.) = a^ + a,t. + a-t. + n. 

1 i01i2ii 

tQ = 0, + 1, ~ N(0,a^) 
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The question of how high the order of the difference must 



be to eliminate the polynomial component is not clear-cut. As 
a matter of fact, the polynomial component does not have to be 
eliminated entirely for a particular order of successive differ- 
ences to be used to identify outliers. It is sufficient that 
the contribution of the polynomial component be small with 

respect to the noise component for to be useful as 

an indicator of a disturbance d. in x. . 



1 1 

(This is intimately related to the problem of fitting 
polynomials to segments of a torpedo path. If (1) torpedo path 
does not change too radically, (2) the length of the path segment 
to be fitted is short enough, and (3) the data rate is high 
enough, then low order polynomials can provide satisfactory 
approximations to the path. In Reference 1, path segments of 
21 and 11 points were explored briefly. Path segments consisting 
of 7 points has been suggested but not examined as yet. In 
many of these segments examined polynomials of order k £ 3 
produced acceptably small and apparently random residual errors 
for 11 point segments.) 

From Tables 1, 2.1-2. 3 it can be seen that a successive 
difference D.. of order j involves j+1 successive observations 
x^ . For j £ 4, as proposed for screening for outliers, at most 
five data points are involved. These can be fitted reasonably 
well by polynomials of order k £ 3. Supporting evidence for 
this is available in the successive differences for the 3-D 
data on the torpedo run examined in this study. Discussion of 
the analysis justifying this contention will be presented in 



a later section. 
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An alternative has been suggested. It incorporates 
control information (information obtained by alternate means on 
the command and control of a torpedo) to provide appropriate 
values for the polynomial coefficients and to indicate appro- 
priate polynomial order for fitting data. In the linear case 
this information should be in the form of a specific value or 
bound for a^^ . Since ~ illustrated in the 

accompanying sketch with V a velocity vector and |V | the 
magnitude of V, one possible value for a^ would be a^^ < |V1 . 




This will be shown to dominate the noise component for 

3-D data. Information from control data on 9 could be used 

★ 

but would require a^^ (and hence the threshold D^) to be 
treated as a function of position on the torpedo path and 
hence as a function of t^ . For the purpose of preliminary 
screening for outliers, it would appear preferable to concen- 
trate on successive differences of sufficiently high order 
that the polynomial component can be considered negligible. 

ic 

With this constraint, a constant threshold D. can be used 

3 

for all successive differences of order j. 
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D. The Noise Component 



When the polynomial component has been eliminated, 

attention can be concentrated on the noise component n^^ of 

the jth order successive differences. In engineering parlance, 

the problem of identifying outliers can now be considered as 

one of detecting a signal (a disturbance d^) in the presence 

★ 

of noise (n . . ) . The thresholds D . can be expressed as specified 
31 3 

levels of which are seldom exceeded by noise only and hence 

which indicate the presence of a disturbance d^ . In order to 

•k 

establish values for D ^ , a statistical analysis of the noise 
component is required. 

Recall the assumptions in Section 2. A that the noise 

component n^ is a realization of a random variable with 

2 

~ N(0,a ) and that and are independent for i 7 ^ j . 

It can be established from the definitions of successive differ- 
ences that the noise component of can be defined 

in terms of the noise components n^ of x^ as follows : 



"li 


= n , - 
1 


^-1 






^2i 


= ^+1 


- 2n . 
1 


+ 


"l-l 


"3i 


"i+l 


- 3n, 
1 




3"i-l - "i 


^4i 


" ^i + 2 


- ^"x^l 


+ 6n. - 4n 

1 
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Each of these noise components have mean 0 since the n^ ' s 
are assumed to have mean 0. 

The variance V. of N.. can be expressed in terms of 
3 
2 

the common variance a of the n^'s using the independence 
property of the n^'s. These are presented below together 
with some of the covariances interest later. 

s t 

1 Order Noise Differences 



= 2a‘ 

C(nii, = -a 



rid 

2 Order Noise Differences ^^2i^ 



^2 


^ 2 
= 6a 




. 2 


= -4a 


^^^2i' ^2,i + 2^ 


2 


= a 


ird 

3 Order Noise Differences 


^3 


= 200^ 


^^^3i' ^3,i+l^ 


= -120^ 


C(n3i' ^3,1 + 2^ 


= 60 ^ 



4 th 



Order Noise Differences 



V 4 = 70a^ 



C(n^., 




= -560 


= '"4i- 


"4,i42> 


= 230 
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Selected Covariances 



C(n2i, 


n3i) 


= 10a 


C(n2i^ ^3 


,i+l^ 


= -10a 


C(n2i, 


"4i^ 


= -20a 


C(n^^, 


^4,i) 


= -35a 


^^^3,i+l' 


n4i) 


= 35a 



Since all the N.. 's are normally distributed with mean 0, 
it can be established that 



P( IN^^I >_ 3 v'V^) = 0.997 . 



If we set Dj = 3 / then, for applications in which the poly- 
nomial contributions to D.. have been eliminated, there will 

31 

be, on the average, less than one time in 200 independent trials 

* 



in which 


the 1 




will exceed D . 

3 


due to noise 


alone. The 


suggested 


thresholds 


for detection of 


disturbances 


are given 


below . 


j 


0 


1 


2 


3 


4 




* 
D . 


3a 


4 . 24a 


7.348a 


13.416a 


25.10a 



* 2 
The term with j = 0 corresponds to = a (i.e., 

the variance of and hence of when no polynomial 

is involved) . 
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The suggested thresholds are worth some further exploration 
As an oversimplified case consider a situation in which no poly- 
nomial contributions are involved, n^^ = 3a for some k, and 

n^ = 0 for irk. The relationships of the D^j^'s to the 
★ 

D . ' s are shown in the following table , 



j 


0 


1 


2 


3 


4 


D = n 

]k jk 


3a 


3a 


-6a 


-9a 


18a 


* 

°k 


3a 


4 . 24a 


7.35a 


13.4a 


25.1a 




1 


.70 7 


.816 


.671 


.717 



Since greater than the corresponding expression 

for j = 3 or j = 4, it could be anticipated that the second 
order differences (the D 2 j^'s) might be better detectors for dis- 
turbances when the polynomial contribution is linear. This 
will be demonstrated for an isolated disturbance in a later 
section of this report. 

The type of information to be seen in the special case of 
an isolated noise element n^ can be generalized. The co- 
variances are useful for this purpose. Note that, comparing 
the special case to the covariances. 
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Special Case 



Covariance 



°2k 



°2,k+l~ 






°2k ' 



Djk = -90 



C(n 2 ^^ “ +10a' 



°2k = 



D., = 18a 

4k 



C(n 2 i/ = -20a' 



This relationship can, perhaps, be made clearer by considering 
the correlation coefficients. For example. 



rCn^., n^.) 5 



C(n2i_ n^.) 



-20a' 






y (6a^) (70a^: 



= -0.976 



The other correlation coefficients of interest here are 



and 



r(n2i, 1^2^^^^) 
r(n2i, n3.) 



^3,i+y 



r(n3i^ n4i) 



^^4,i+y 



^ = -0.667 , 



10 



\ZT20 

-12 

20 

-35 



vTIM 

-56 



= 0.913 , 
=- 0.6 , 

= -0.9 35 , 



70 



= - 0.8 
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These can be interpreted as follows. In general, if ^2i. 
a large value, then and n^^ can be expected to have 

fairly large values of the opposite sign and n^^ a fairly 
large value of the same sign. The importance of this in detect- 
ing outliers is that the information provided by different 
orders of differences at the same point and by differences of 
the same order at adjacent points is primarily of a confirmation 
nature rather than providing complementary information. This 
can be interpreted to the more practical statement that, for 

example, if a disturbance in x. which does not cause a cross- 
★ 

ing of by then it will usually not cause a threshold 

crossing by ^2x' ^3i' ^4 i-1 i+1 ' other hand, 

★ 

if exceeds in magnitude, then one or more of these 

other differences has a reasonable chance of crossing its pre- 
scribed threshold. 

As a consequence of the complementary nature of threshold 

crossings and of the fact that D. . is less likely to be con- 

taminated by a polynomial component, it is suggested that the 

testing for outliers be performed by testing only fourth order 

differences (the D^^'s) for crossing of the appropriate 
* 

threshold . 

Before considering the disturbance component of x^, 

it would be of interest to consider the relative magnitudes 

of polynomial and noise components of 3-D data. Of particular 

*• 

interest here is the comparison of a^^ with since these 
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are the vital components if the first order differences are to be 
used for detecting outliers. Since = 1^1 *^03 9, it can be 

seen that a^ achieves its maximum magnitude when 9=0° or 
9 = 180° . A plot of the path of the torpedo in the torpedo run 
selected for examination in this study and the corresponding data 
together with the first four orders of differences are presented 
in Appendix A. It can be seen that 9=0° occurs in the 

vicinity of t = 950 and 9 = 180° occurs in the vicinities 

of t = 807, 853, and 917. An approximate value of |v| is 
satisfactory for the present purposes and the value |v| = 95 
will be used. 

Establishment of a bound for the noise in the form with 
P( In^^I > 3a^ ) < 0.01 , 

with a = 2cr“, requires estimation of c“, the noise variance. In 
1 

Reference 1, estimates of a as low as 2 or 3 were obtained 

for selected segments of the torpedo run to be used here. It 

will be assumed for this examination that a = 4 and hence that 
= 5.65 6 and hence 3a., = 17. 

Boundary for can then be set in the form 

D, = + [|v| + 3a., ] = + 112 . 

1 - ^1 - 

Thus, only if were greater than + 112 or less than -112 

would a disturbance be indicated. Using the formula 

D* = |V| cos 9 + 3a., 

3 - 
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when 9 is given we have 







* 

'1 


9 


Lower threshold 


Upper threshold 


0° 


95 - 17 = 78 


95 + 17 = 112 


90° 


-17 


+ 17 


180° 


-95 - 17 = -112 


-95 + 17 = -78 



It can be seen that detection of disturbances in the first order 
differences unless 9=0° or 180° will not be reliable when 
a general threshold of the form 



* 




[!V| + 3a^ ] 



is used. 



S . The Disturbance Component 

The presence of a disturbance or perturbation in an obser- 
vation can be represented as an additional component d^ 

so that 



Xi = x(t^) = P(t^) 



+ n . + d . 
1 1 



There are several types of perturbations that could be considered 

One of these, an 'outlier' or isolated disturbance d. that 

1 

occurs in only one observation x^, is the simplest. The effects 
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of such a disturbance is shown in Table 3.1 and the accompany- 

★ 

ing sketch. Figure 3.1. In the sketch both d and the D^'s 
are expressed in terms of the parameter a (the standard deviation 
of the noise component n^) . The value d = 5a is used for 
illustrative purposes . Also note that the ordinate is 





P . . 

31 





and hence represents only the disturbance component of x 



ji 



There are several features of the successive differences 

that should be noted when an isolated perturbation occurs. First, 

consider an observation x^ (in our example x^ = 4 ) consisting 

of an isolated disturbance d = ka without any noise (n.. =0 

for all j and i) and with polynomial component P(t.) =aQ + a^^t^. 

* 

The values of k for which the thresholds (D^'s) are achieved 

j 

are shown below. 



j 


2 


3 


4 


“34 


2ka 


3ka 


6ka 


* 








D . 


7.35a 


13.4a 


25 .la 


3 






Critical k 


3.675 


4.467 


4 .18 3 



In the absence of the noise and polynomial components, 
the second order difference 02 ^^ will provide a threshold cross- 
ing for a smaller isolated disturbance (d _> 3.675c) than either 
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the third order difference (d _> 4.4 76a) or the fourth order 
difference (d _> 4.183a) and is slightly better than 

If assurances could be given that the polynoniial component was 
no higher than the first degree, then the second order differ- 
ences (the would appear to provide the most sensitive 

location to test for isolated disturbances. If polynomial 
components of the second or third degrees are possible then 
the fourth order differences (the D^^'s) appear to be prefer- 
able for testing. 

Next, consider the pattern or signature produced in the 

ordered differences by an isolated disturbance at t^. Both 

D- . and D,. will contain their maximum contributions from 
2x 4i 

the disturbance at and (of opposite signs) and both 

will have substantial but smaller contributions of opposite 

signs at the adjacent points (D 2 and D 2 and 

D. , and D. . . The third order differences (the D^.'s) 
4,r-l 4,r+l 3i 

will have contributions of equal magnitudes but opposite signs 

at adjacent positions (D^ and D, ,) and smaller contri- 

butions at the next positions. Incorporation of their signature 

although clearly recognizable, in the graph (see Fig. 3.1) 

would be difficult to incorporate in a program for automatic 

computer filtering of outliers. 

The last item for discussion of isolated disturbances 

pertains to the addition of noise and disturbance components. 

Consider, now a disturbance d = 5a in x (x. in Table 3.1) 

r 4 
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and its effect on in the presence of noise. A positive 

★ 

value of n^^ will enhance crossing the threshold so 

attention can be directed to the effects of negative values 



for n^^. If 



n 



n, < -(30a - 25.1a) = -4.9a 

' /TO 



4i 



= -.586a 



n 



4i 



then D. will not cross the upper threshold D. = 25.1a. For 
4r 4 

this situation the probability of a threshold crossing is 
* 

P(N. > D.) = .721. In this event n. , and n. ,, will, 
4n 4 4 f r~x 4 / r-ri 

in general, be positive since 



= -0.8 (Section C) 



and hence neither nor can be e.xpected to cross 

the lower threshold = - 25.1a. Also, as a consequence of 

r(n 2 j^» ^4i^ ~ -0.976, a negative value for n^^ can be expected 

to be accompanied by a positive value for n 2 ^ and hence 02 ^ 

★ 

will not cross the lower threshold D 2 = -7.35a. Further, 
since r(n-., = -0.667, neither , nor ,, 

Zl. 2/1 + 1 2 / 1 ""! 2 / 1+1 

★ 

can be expected to cross the upper threshold D 2 = +7. 35a. 

Similarly the correlations rCn^j^^ ^4i^ ~ -0.935 and 

rCn^j^/ n^ = -0.6 make it unlikely that either or 

★ 

will cross the lower threshold = -13.4a or the 

★ 

upper threshold = +13.4a, respectively. 
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TABLE 3.1 



SUCCESSIVE DIFFERENCES 
Linear Case: Isolated Disturbance d 
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FIGURE 3.1 
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The proposed use of only one order of successive differ- 
ence (namely, to test for outliers appears reasonable for 

isolated disturbances . If exceeds its threshold then this 

will will usually be accompanied by and D, exceeding 

their thresholds in the opposite direction. 

Attention can now be directed to disturbances other thar 
isolated ones. Consider, next, a situation involving distur- 
bances d^ and d,^ in two observations. For simplicity, it 
will be assumed that they have the same magnitude, d, but can di 
in sign and/or location. The situation with two adjacent dis- 
turbances of the same sign is presented in Table 3.2 and Figure 
3.2. Note that the magnitudes of the contributions of the 

disturbances to D.. and D.,. (D. and D. ,, for equal 

44 4b 4r 4 , r+1 

disturbances in x^ and is substantially reduced from 

that in case of an isolated disturbance as is the contributions 
to the next adjacent observations. It is evident that large 
adjacent disturbances of the same sign will be less likely to 
cause threshold crossings . Note that a large noise component 
in one observation for example) will, in general, be 

accompanied by a large noise component of the opposite sign 
(r(n^^, n^ ~ -0.8) in the other observation and hence 

enhance the probability of a threshold crossing by one of the 
differences or r-t-1 ’ general, two adjacent large 

values of the same sign in ^2i ^4i ^ signature 
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TABLE 3.2 



SUCCESSIVE DIFFERENCES 



Linear Case: Adjacent Equal Disturbances 
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of adjacent disturbances of the same sign. (The possibility 
of using reduced thresholds for this situation has not been 
explored.) The magnitudes of the D^^'s are also smaller than 
in the single disturbance situation and are separated by an 
observation ('^ 35 ) involving noise only. 

Next, consider adjacent disturbances of equal magnitudes 
but opposite signs. This situation is presented in Table 3.3 
and Figure 3.3. The additive, or magnification, effect of the 
opposing signs should make even moderate magnitudes of the 
disturbances readily detectable. The pattern or signature 
should be clearly evident. It is suspected, however, that the 
occurrence of this situation in real-life data would be 
extremely rare in comparison to the previous situation. 

The situation in which two disturbances of similar 
magnitude and sign separated by one unperturbed data point is 
presented in Table 3.4 and Figure 3.4. From the graph it can 
be seen that this situation looks much like a situation with 
a single isolated disturbance of somewhat greater magnitude 
and opposite sign (Fig. 3.1) . This brings the danger that 
the observation x^ (between the two observations with dis- 
turbances) could be erroneously labeled as an outlier and hence 
removed and treated as a missing point. In the next section 
missing points and their replacement by the average of the 
observations on each side of the missing point will be discussed. 
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TABLE 3.3 



SUCCESSIVE DIFFERENCES 

Linear Case; Adjacent Opposed Equal Disturbances 
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TABLE 3.4 




30 




-40a ' 



250 

13.42a 

7.35a 



FIGURE 3.4 
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This treatment would introduce the disturbance d in the new 



value for and hence to three adjacent equal disturbances. 

The latter situations presented in Table 3.5 and Figure 3.5. 
Note, first, that removal of an observation and replacement of 
the missing point should be followed by recalculation of the 
ordered differences affected and, second, that the magnitudes 
of the contributions of the disturbances to the ordered differ- 
ences are substantially reduced from the contributions in eithe: 
the isolated disturbance situation or the separated distur- 
bances situation. In this modified situation the reduced 
thresholds presented in the next section will improve the 
capability of indicating the presence of the two separated 
disturbances. A threshold crossing by any of the D^^'s with 
i = 3, 4, 5, 6 in the modified results should serve as an 
indicator that disturbances may be present in x^ and Xg 
rather than in Xg . 

In addition to the occurrence of three adjacent and 
equal disturbances in the treatment of two such disturbances 
by replacing missing points, it is possible that this situation 
can occur due to the persistence of the perturbation causing 
the disturbances. The lower disturbance contributions to the 
ordered differences could readily fail to produce a threshold 
crossing as could the situation with two adjacent equal dis- 
turbances whereas the situation with an isolated disturbance 



32 



TABLE 3.5 



SUCCESSIVE DIFFERENCES 



Linear Case: Three Adjacent Equal Disturbances 
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FIGURE 3.5 
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of the same magnitude would yield a threshold crossing. These 
situations with more than one adjacent, equal disturbances may 
require greater consideration of the signatures identifying 
them. (See Figures 3.2 and 3.5.) Such modifications are not 
examined further in this report. 

For the present, it will be assumed that successive 
differences will be incorporated in a data smoothing algorithm 
for the two purposes discussed in the introduction (Section I) , 
namely, identifying outliers and indicating appropriate order 
polynomials for fitting the data. There are two ways that 
sequential differences can be used in identifying outliers - 
One is as a preliminary screening to remove some of the more 
obvious outliers to be followed by a reexamination for outliers 
in the curve fitting portion of the data smoothing algorithm 
as presently incorporated in the general track smoothing program 
MASM3DRJ. The other approach would require sequential differ- 
ences to provide the only means of identifying outliers. As 
indicated by the comparatively simple situations considered 
here, this would require considerably more modal development 
and become a considerably large portion of a data smoothing 
program. For the purposes of this report, the first approach 
will be considered appropriate, ' 

A situation with two equal disturbances separated by 
two unperturbed observations is presented in Table 3.6 and 
Figure 3,6. It should be observed that when disturbances are 
separated by as few as two points they can be considered essen- 
tially as isolated disturbances. (See Table 3.1 and Figure 3.1.) 
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There are other types of perturbations that could, and 
possibly should, be considered for potential identification by 
successive differences. Only one of these will be examined here 
This is the situation in which the torpedo changes from a 
linear path at t^ to a different linear path at • This 

situation is presented in Table 3.7 and Figure 3.7. As can be 
seen by comparison of Table 3.7 with Table 3.1, it is possible 
that a path change at t = r could lead to the identification 
of as containing a disturbance d depending on the magnitu 

of and d. The resemblance of the signature (graph) of 

in the two situations could be even more striking for a 
value of d such that 2 Table 3.1 (corresponding to 

^_2 of Table 3.7) were small enough to be submerged in noise 

and = 3d. That a path change could conceivably cause a 

threshold crossing of D* by can be seen in the case of 

a 90° change from 9=0 to 6' = 90° (or, vice versa) where 
|A^1 = |v| =90. The situation is even worse for a 90° 
change from 9 = 45° to 0' = 135° with lA^^I = 1.4 (90) = 126. 

Possible methods of identifying path changes to prevent 
mis-identif ication as outliers include reconsideration of 
labeled outliers after fitting curves to the data and provision 
from an external source such as control information. The first 
method requires greater complication of the data smoothing 
program involving cycling and hence negates the intent of a 
simple screening program for outliers . The second requires 
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Linear Case: 



Path Change at t^ 
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input information from another source and is also undesirable 
but to a lesser extent. An alternative treatment is to accept 
such identification of point of path change as providing an 
outlier to be removed from the data. The consequences of this 
treatment will be examined in a subsequent report on curve fitting 
and appears, at least for the present, to be a reasonable way 
of handling the situation. 

There is still another kind of perturbation which can, 
and has been observed to occur. This is a change in the noise 
component and represented by a change in the value of the 
standard deviation o . Such changes may be a result of changes 
in the environment or of the data gathering system. Evidence 
of such changes in the value of a should be accommodated by 
corresponding changes in the threshold levels. 



F . Missing Points 

The occurrence of missing observations in a sequence 
of observations needs some consideration. A missing observation 
can be present in the data input or occur as a result of deletion 
of an outlier. Note that, in the latter case, recalculation of 
successive differences will be required in the vicinity of 
the deleted observation. 

As the simplest procedure for replacing missing points, 
the currently used procedure of averaging over the adjacent 
points will be used here. (This also will be re-examined when 
curve-fitting is discussed.) Thus, when x^ is missing it 
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it will be replaced by 






= T 



r-1 



+ 



==r+l> 



and when adjacent values and 

be replaced by 

= Vl ^ I ‘■''r+2 - ^r-l' * T 

. ^ 2 , . 1 
r+1 r-1 3 r+2 r-1 3 



are missing they will 



( 2x T + X , - 
r-1 r+2 



(x , + 2x , - 
r-1 r+2 



) 



) . 



The general formula for k successive 



I i_ ! \ 

r+] r-1 k+1 r+k r-1 



missing points is 
for j = 0 , . . . , k-1 . 



There is a serious question, however, if an analysis of 
successive differences is improved by replacement of more than 
two successive missing values. It would appear more reasonable, 
at least on examination of the fourth order successive differ- 
ences which involve only sequences of five observations, to 
restart calculation of successive differences at the first 
observation after a sequence of more than two missing observation 
The situation involving a missing point with linear 
polynomial and noise components only is presented in Tables 4.1 
and the accompanying definitions for the modified noise 
components with their variances. Reduced thresholds could 
be used as indicated in Table 4.2 and Figure 4.1. These reduced 
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TABLE 4.1 



Linear Case; Missing Point (x^) Averaged 
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TABLE 4.1 Continued 
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TABLE 4.2 



Linear Case: Detection Thresholds for Missing Point Datura at 

D? . = 3a , Table Values for 3a . . 

li n . . ii' 

31 
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Thresholds in Vicinity of a Missing Point 
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thresholds could be useful in identifying situations involving 

equal disturbances separated by one observation where that 

observation is labeled as an outlier and replaced by the 

average of the two observations with disturbances. Recalculation 

of the fourth order differences produces the disturbance 

components given in the last column of Table 3.5 which are 

shown with the modified thresholds in Figure 4.2. (This 

situation is the same as for two disturbances separated by a 

missing point.) Persistence of a threshold crossing at t^ 

after deletion and replacement of the observation x can 

r 

be an indication that disturbances may be present in 

X , and X , instead of, or in addition to, a disturbance 
r-1 r+1 

in X . 
r 

Some additional work is required here to assist in 
developing that portion of the data smoothing program dealing 
with successive differences. It is fairly clear that the 
existence of a threshold crossing requires more effort to 
determine whether it indicates an isolated outlier or a more 
complicated situation. A situation with two adjacent missing 
observations and no disturbances is displayed in Table 4.3 
accompanied by the expressions for the noise components in 
terms of the observational noise. The variances for the noise 
components presented there provide the basis for the thresholds 
shown in Table 4.4. The thresholds for the isolated missing 
point situation are also shown in Table 4.4. Note that the 
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Two Disturbances Separated by a Missing Point Averaged 
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TABLE 4 . 3 



Linear Case: Adjacent Missing Points Averaged 
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TABLE 4 . 3 Continued 
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TABLE 4.4 



THRESHOLDS FOR NOISE IN ONE AND TWO MISSING POINT SITUATIONS 

k such that Dt. = 3a = ka 

11 n . . 
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thresholds in the two missing points situation are smaller 
than the corresponding ones in a situation with a single 
missing point. 

A situation in which a disturbance occurs in an obser- 
vation adjacent to a missing point is presented in Table 4.5 
(It is suspected that in situations involving one or more miss- 
ing points, could also involve disturbances immediately preceding 
of following a missing point due to deteriorization of physical 
conditions.) The disturbance components are shown in relation- 
ship to the common thresholds appropriate when there are no 
missing points in Figure 4.3 and to the reduced thresholds in 
Figures 4.4, 4.5 and 4.6. It can be seen that the use of the 
modified thresholds can increase the potential crossing of 
thresholds in the vicinity of a missing point substantially. 

Examination of the effects of missing points on the 
ability of successive differences to indicate the presence of 
disturbances is not complete. For example, situations with 
disturbances preceding and/or following adjacent missing 
points have not been examined. Nevertheless, some indications 
of the consideration of missing points in the use of successive 
differences to screen 3-D data for outliers can be suggested 
at this point in the development. Under the guiding principle 
of keeping the data smoothing program as short and simple as 
possible, and with the understanding that a further screening 
for outliers could be included in the curve fitting portion of 
the program, the following steps appear reasonable: 
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TABLE 4 . 5 



Linear Case: Disturbance Following Missing Point 




(For n*. ’s see Missing Point Table, Table 4.1.) 
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Linear Case: Second Order Differences vs Thresholds 

Disturbance Following Missing Point 




FIGURE 4.4 



Linear Case: Third Order Differences vs Thresholds 

Disturbance Following Missing Point 
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Linear Case: Fourth Order Differences vs Thresholds 

Disturbance Following Missing Point 
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(1) Supply missing points using the averaging method. 

(2) Screen for outliers using the fourth order differences 

D. . and the common threshold Dt . 

4i 4 

(3) Replace any outliers found by the averaging method. 

(4) Screen for outliers in the vicinity of any values 

replaced in Step 3 (not those in Step 1) using the 

reduced thresholds DJ . for the D. . 's. 

4i 4i 

(5) Any outliers found in Step 4 should be referred for manual 
examination, at least until further development can provide 
satisfactory provisions for inclusion in the smoothing 
program. 



G . Noise Variance 

In Section 2.D, it was assumed that the noise components 

of the data were normally and independently distributed with 

2 

zero means and common variance a . This variance, or more 
specifically the standard deviation a, must be known before 
the thresholds discussed in Sections 2.D, E, and F can be 
specified. Selection of an appropriate value for o requires 
more detailed examination. Three potential sources of values 
for a will be considered here. 

In Reference 1, which incidentally used path segments 
from the same set of data to be used in this study, sample 
standard deviations of magnitudes S = 2 or 3 were calculated 
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for some path segments. Sample standard deviations provide 
the primary sources of information on the value of a and hence 
are of considerable interest in setting threshold values . They 
can be, unfortunately, contaminated by the polynomial components 
in the observations as was demonstrated in the reference . 
Nevertheless, a value of the order of a = 3 or a = 4 is 
an approximation which could be used in setting thresholds for 
screening for outliers. Experience with larger samples includir 
other runs will provide a more reasonable basis for estimating 
It is to be expected that there will be spatial and 
temporal variations in c. Spatial variables can be present 
because of the geometry of the vehicle-sensor orientation. Date 
from which the value of a and its spatial variations should 
be available from previous and continuing calibration data 
collected on the position location system. Information on tempo 
variations should be available from the same source and should 
also be monitored during the collection of any data for which 
data smoothing is to be performed. It should also be expected 
that there will be interaction between spatial and temporal 
variation in a , i.e., that the temporal variation can be 
different for different locations on the path of the vehicle 
being tracked. This would imply that the thresholds to be used 
for indicating outliers may, and probably should, be changed 
depending on the location and time of the data to be smoothed. 
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The t±iird potential source for information on a is 
the data to be smoothed. A single estimate S for a = a 
may be calculated from the complete set of data or estimates 
may be calculated for segments of the data. These can be 
expected to be contaminated by both the polynomial and pertur- 
bation components in the data. Reduction in the polynomial 
component contribution could be obtained by using successive 
differences as the source of the estimates. Thus, for example, 
the sample variance of the fourth order differences 



= I (D. . - 5 ) ^ 

n-1 . 4i 4 



i=l 



where 



n 



- 7 ; I °4i 

1 = 1 



could be used as an estimate of 




leading to the estimate 




This should have little or no contamination from the poly 
nomial components of the observations. If the outliers are 
reasonably rare, the perturbation contributions should also 
be small and the resulting estimate could be a reasonable 
alternative. Note that estimates of a could be obtained for 
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segments of the data and hence could be made to respond to the 
spatial and temporal variations in o discussed in the second 
alternative . 

This third method of estimating a has a direct 
relationship to the method (Grubb's) incorporated in the 
currently used program for identifying outliers. In fact, a 
Grubb's type of screening could be performed with the sample 
variances of successive differences where an observation is 
labeled an outlier if its removal provides a substantial 
reduction in the sample variance. This possibility has not 
been explored. 



H . Algorithm for Identifying Outliers 

The following algorithm is suggested for identifying 
and removal of gross outliers. Two basic principles are 
considered essential : 

(1) The algorithm should be simple and short. 

(2) A subsequent and more thorough search for outliers will 
be incorporated in the data smoothing program concurrent 
with or following the curve-fitting portion of the program. 

The steps of the algorithm are : 

I. Calculate values for missing points using the method 
of averaging. 

2. Calculate the fourth order differences D.. 

4i ■ 
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3. 



for 



Identify as outliers and remove from the data any x^, 
which I ^^ 4 ]^ I ^ 25.1a. (The suggested value of a to be 
used here is a number between 3 and 4.) 

4. Replace any Xj^ identified as an outlier in Step 3 using 
the averaging method as in Step 1. 

5. Recalculate the fourth order differences which involve Xj^ . 

(These are ' ^4,k+2‘^ 

6. Re-examine the modified fourth order differences of Step 5 
for' outliers as in Step 3. 

7. If additional outliers are found in Step 6, either 
additional steps must be designed to locate potential 
outliers in the vicinity of the observation x^, (from 
Step 3) or the problem must be identified for manual 
treatment . 

I . Identifying Polynomial Components 

In using successive differences to indicate the appropriate 

degree of the polynomial component P ( t) , attention is directed 

to the sequence of signs of the differences of the same order. 

The reasoning for this is as follows. In Section 2.D it 

was established that the noise component n . of the i^ 

3 

difference of order j is a linear combination of the n^^'s 
(the noise components of the observations) . If the N^'s 
(the random variables of which the n^'s are realizations 
and hence the noise components of the observations, x^'s) 
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have zero means as assumed in Section 2. A, then the N^^'s 

will also have zero means. In any sequence of differences 

1 rN 

of order j, the mean value of the differences ~ n ^i=l ^ji 

will also have zero mean. In the absence of a polynomial 

component with a term a^t and without a disturbance com- 

ponent, the r^ order difference terms D . = n . and hence 
^ • ri ri 

the mean value 



. n , n 

5 = -I D.=-5^n. 

r n ri n ^ ri 



should be near zero. The occurrence of a sequence of differ- 
ences of order r having the same sign will have a mean value 
with that same sign and hence can be interpreted as an indi- 
cation of the presence of another component. Further, a dis- 
turbance in the form of an isolated disturbance will provide 
contributions of alternating signs to a sequence of differ- 
ences of order r. Thus the reasonable interpretation of the 
sequence of similar signs is the presence of a polynomial con- 
tribution a to the D . 's. 

r ri 

Note that values of a^ which are small with respect to 

the noise components of the ^ (i.e., small in comparison 

to o„ ) can fail to cause the sequence of D . ' s to have the 

ri 

sign of a since a will no longer dominate the n 's. 

^ r ri 

Thus the absence of a sequence of D 's of the same sign can 

ri 

not be taken as an indication that the polynomial component has 
degree less than r. However, the presence of a sequence of 
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differences of order r having the same sign should be con- 
sidered as an indication that the polynomial component will 
be of degree of at least r. 

The nature of the property to be used for identification 
of appropriate polynomial degree can, perhaps, best be illus- 
trated by a situation in which a polynomial of degree one 
(Pj^(t) = aQ + a^^t) is fitted by the method of least squares 
to a set of data with a polynomial component of degree two (a 
parabola P2(t)) and a small noise component. The situation 
might appear as sketched below. 




The residuals errors “ ^i^^i^ have sequences of 

similar signs (a sequence of negative signs, followed by a 
sequence of positive signs, and ending with another sequence 
of negative signs) . Fitting a polynomial of degree two to the 
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same data should produce a polynomial very close to 
and with residuals close to the noise components and hence 
with signs similar to the signs of the noise components which 
are random. 

The question of how long a sequence of D 's of the 

^i 

same sign is required to indicate the presence of a polynomial 

term a^ has not been resolved. For any the probability 

that N is greater than zero is 0.5. The probability that 
i 

a sequence of positive values for k such independent variable 
is the probability that a positive value will be followed by 
k-1 positive values is 



k— 1 

P (k positive values) = (0.5) , 

and 

P(k >_ 5) = 1 - P(k < s) = (0.5)^"^ . 

Thus 

P(k > 4) = 0.125, P(k > 5) = 0.08, P(k _> 6) = 0.03. 

Thus a sequence of six or more successive differences of the 
same sign would be unlikely to occur do to noise alone, 

^3 -tht yioi.-i>z components wexz -independent. But the noise 
components are not independent and, as established in Section 
2.D are negatively correlated. The probability P(k >_ 5) 
is siabstantially less than the value given above in the case 
of independence and it is suspected that a sequence of four 
differences of order k can be taken as an indication that 
the polynomial component is at least of degree k. 
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The situation is complicated even further by the fact 
that, for example, fourth order differences involve only five 
consecutive observations but the contemplated length of data 
segments considered for curve fitting is seven or eleven. 

It is conceivable that a polynomial fitted to the five points 
covered by a fourth order successive difference would be of 
a lower degree than one fitted to a longer sequence. On the 
other hand, if a polynomial of specified degree does not fit 
a sample of given length very well, it cannot be expected to 
fit a sample of greater length very well. Thus the informa- 
tion obtained is of a negative form in that it can be used 
to eliminate lower degree polynomials from further consider- 
ation . 

There is a temptation to apply standard sign tests or 
the theory of runs to sequences of successive differences. 
These, however, require independence of noise components and 
would involve substantially more development to make them 
suitable for incorporation. They could be useful in the 
curve-fitting portion of the data smoothing program to test 
whether the polynomial degree is appropriate by testing 
whether the residual errors are of random sign or whether 
sign patterns exist as illustrated above. 
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3. APPLICATION OF SUCCESSIVE DIFFERENCES 

The use of successive differences in locating outliers 
and in giving indication of appropriate polynomial degree for 
curve fitting will be illustrated for a specific set of 3-D 
data. This data was obtained from a test in which a torpedo 
was launched against a submarine at the Naval Undersea Warfare 
Station. The 3-D data involves coordinates recorded at equally 
spaced times with very few data points missing. Data for the 
X and y coordinates and a plot containing every fifth time 
is provided in the Appendix. 

Suppose, now, that a noise standard deviation value 
c? = 4 is appropriate so that the threshold level for the 
fourth order differences is D| = 25. lo = 100.4. The first 
threshold crossings in the data occur at t^ = 908, 909, 910, 9 
Table 5.1 shows the values of x. , y. and the successive 
differences in the neighborhood of these points. (These are 
reproduced here from the appendix for comparison with the 
results of treatment.) The situation here is somewhat confused 
It does not conform to the signature (pattern) for a single 
isolated disturbance. One possibility procedure is to declare 
all four observations on x and on y as outliers . Instead 
of doing this consider one point at a time. Since the largest 
magnitudes of the ^ occur at time t^ = 909, the corre- 

sponding values of x^ and y^ will be declared outliers. 
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Replacing these values with the average of the values at 

t^ = 908 and t^ = 910 yields the modified results presented 

in Table 5.2. All of the fourth order successive differences 

are now less than D| and, moreover, are less than the modified 

thresholds given in Table 4.2 (see Figure 4.1). 

There may, and should, be some doubt as to whether 

declaration of the observations at t. = 909 as isolated out- 

1 

liers as sufficient treatment for this situation. As can be 

seen in Table 5.2, the fourth order deviations at t. = 911 

1 

are quite large even though they do not exceed their threshold. 
Further, the signatures at both and y^ are similar to 

what would be anticipated for isolated disturbances at t = 911. 
If, for example, the noise standard deviations were a = 3 
instead of a= 4, then the x. and y. at t = 911 would 
both exceed their thresholds and be declared outliers. The 
results of this treatment are shown in Table 5.3. All of the 
large successive differences have been reduced substantially 
and the situation now appears to be free of disturbances. 
(Reduced thresholds for situations involving two disturbances 
separated by a non-disturbed observation are not available 
but should be derived so that the treatment could be completed.) 

As a peripheral examination of this situation, the 
possibility that the observations at t = 910 as the initial 
outlier was examined. Note that the fourth order differences 
at t = 909 and t = 910 are reasonably close and could. 
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TABLE 5.1. SUCCESSIVE DIFFERENCES NEAR t. = 909 
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TABLE 5,2. OUTLIER REPLACED AT t. = 909 
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TABLE 5.3. OUTLIERS ItEPLACED AT t. = 909, 911 
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possibly have been reversed in order of magnitude by the noise 
components. The results are presented in Table 5.4. Both 
and y^ at t = 909 are now indicated as outliers, exceeding 
not only the modified thresholds but the general threshold 
D| = 100.4. Replacing both points as outliers yields the results 
shown in Table 5.5. An interesting outcome should be noted. 

The fourth order differences for both x and y at t = 910 
now exceed the modified threshold appropriate for situations 
involving adjacent missing points, namely, D| = 5.1o = 20.4. 

(See Table 4.4.) But the observations at t = 910 have already 
been modified. This suggests that the observations at t = 910 
should not have been considered outliers initially. 

The situation in the vicinity of t = 910 in the data 
provides illustration of several features of the use of 
successive differences in identification of outliers. First, 
identification of outliers by successive differences can be 
awkward when there are several threshold crossings adjacent 
to each other. As can be seen in the situation with threshold 
crossings at times t = 908, 909, 910, and 911, rejection of 
the observations at t = 909 and 911 appear to be sufficient 
to reduce the ordered differences to magnitudes that could be 
produced by noise. A procedure involving rejection of one 
of the observations at a time starting with the largest one 
and recalculating the successive differences to be examined 
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TABLE 5.4. OUTLIER AT t. = 910 
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TABLE 5.5. OUTLIERS REPLACED AT t. = 909, 910 
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for other threshold crossings seems reasonable. If several of 
the successive differences have nearly the same magnitudes, 
however, this could lead to rejection of the wrong observations 
again, as demonstrated by rejecting the observations at 
t = 910 first. 

The second feature of this example is an outgrowth of 
the first. An algorithm, and the subsequent computer program, 
which will provide satisfactory treatment for multiple adjacent 
threshold crossings will be awkward to produce. Nevertheless, 
merely identifying such situations and relegating them for 
manual processing should be avoided since it contradicts the 
objective of complete automatic processing. 

The third feature arises when the first order differences 
are examined. There appears to be a substantial change in 
velocity (the a^^ term of the polynomial component) in both 
the X and y coordinates. The possibility of the perturbati 
in the vicinity of t = 909 being due to a change in the 
polynomial component instead of, or in addition to, disturbance 
causing outliers should be considered. This situation should 
be re-examined when curve-fitting to the data is attempted. 

One final comment on this situation! The analysis 

was performed by consideration of the fourth order differences 

(the D^^'s) only. It appears that the second and third 

order differences confirm the indications of the D..'s but 

4i 

add little of a supplementary nature. Again, this points to 
the use of only one order of differences for indication of 
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outliers and the preference should be for the higher order 
as containing the least contamination by any polynomial 
component in the observations. 

Another example of a threshold crossing occurs at 
t = 851 (Table 6.1) . Note that in this situation only the 
y coordinate produces a crossing. The question as to whether 
the observation at x should also be rejected must be con- 
sidered. In order to answer this question it may be necessary 
to examine the data collection process (e.g., the sensors and 
the geometry of the situation) . The results of replacing 
both the X and y observations at t = 851 are presented 
in Table 6.2. Whether the improvement in the x coordinates 
is worth the effort is debatable at this stage. 

A third event of threshold crossings in the data occurs 
in the vicinity of t = 893. Again, multiple, adjacent 
crossings occur but only in the y coordinates. (See Table 7.1.) 
The successive differences after replacing the observations 
at t = 893 are shovm in Table 7.2 and after replacing the 
observations at both t = 893 and t = 890 in Table 7.3. 
Although the D^^'s are well below the general bound 
D| = 25.1a for a = 4 or a = 3, they exceed the modified 
bo\mds given in Table 4.1 for observations in the vicinity 
of a single missing point. This situation has not been 
pursued further. As in the two situations already discussed 
(vicinities of t = 851 and t = 909) , there appears to 
be a substantial change in the velocity components of the 
vehicular path as evidenced by the values of the D^^'s. 
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TABLE 6.2. OUTLIERS AT t. = 851 REPLACED 
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TABLE 7.1. SUCCESSIVE DIFFERENCES NEAR t. = 893 
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TABLE 7.2. OUTLIERS REPLACED AT t. = 893 
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TABLE 7.3. OUTLIERS REPLACED AT t. = 890, 893 
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The three situations examined above are the only ones 

in which values of D. .'s exceed the threshold D? = 25.1a 

4i 4 

= 100.4 with a = 4. In all three situations the values of 
the indicate that there is a possibility of a pertur- 

bation in the form of a change in the polynomial component 
of the observations. It would thus appear desirable to post- 
pone further screening for outliers until the curve-fitting 
portion of the data smoothing effort. After such treatment 
of this data set and, possibly, experience gained from 
examination of other data sets, the desirability of finer 
screening for outliers using successive differences should 
be reassessed. 

The final comments on the data set considered here 
pertains to information provided by successive differences 
on the appropriate degree of the polynomial to be used in 
curve fitting. As described in Section 2.1, the primary 
evidence to be considered here is the existence of sequences 
of successive differences of a given order having the same 
sign. Naturally, sequences of having the same sign 

occur in the data and would be expected for a torpedo path 
since a torpedo without a velocity cannot hope to intercept 
its target. No attempt to fit a polynomial of degree less 
than one is contemplated. The only occurrences of sequences 
of ^4i'® with the same signs and having length 

greater than four start at t = 859 and t = 863. Since 
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the probability that a sequence of similar signs of length 

3 

greater than S = 4 is P(k >. 5) = (0.5) = 0.167 (if the 

differences were due to noise only and the noise components 
of the differences were independent) . The reduced probability 
of this event, due to the lack of independence, suggests that 
the polynomials to fit both the x and y coordinates 
in the segments t = 851 to t = 867 should be of degree 
at least three and, more likely, four. Examination of the plot 
of the torpedo path shown in the appendix indicates that this 
is, indeed, the segment of the torpedo path where the greatest 
changes occurred. 
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4. CONCLUSIONS AND RECOMMENDATIONS 



During the process of model development and its sub- 
sequent application to data from a torpedo path it should 
be evident that successive differences provide some capability 
for detection of outliers. For practical purposes, an 'outlier 
can be defined as an observation whose magnitude is unreason- 
ably large when only its polynomial and noise components are 
considered. An algorithm for using successive differences to 
detect outliers is presented in Section 2.H. In this algorithm, 
attention is centered on the fourth order successive differ- 
ences (the D^^'s) and successive differences of lower orders 
are ignored in screening for outliers. 

As a secondary use, successive differences provide 
some indication of appropriate polynomial degrees for the curve 
fitting portion of the data smoothing process. This information 
is negative in form with a substantial sequence of similar 
signs for successive differences of a given order providing 
evidence that a polynomial of degree lower than that order 
cannot be expected to provide an acceptable fit to the data 
which produced that sequence. 

The outline for the algorithm presented in Section 2.H 
requires additional development before it can be incorporated 
in a data smoothing program. The primary need here is for a 
more thorough treatment for situations involving missing 
points . 
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since outliers are to be identified by crossings of 
threshold values by successive differences and since these 
threshold values are specified in terms of the standard deviatioi 
a of the noise, the selection of an appropriate value for a 
is fundamental to the screening process . Potential sources 
for values for a are the data gathering system and the data 
available from torpedo paths. 

The possibility of modifying the thresholds (conceptual! 
by using a smaller value for the coefficient of a in Section 2 
to remove some of the outliers identified in the subsequent 
curve-fitting portion of the data smoothing process should 
be examined. Any such outliers that can be identified by 
successive differences can provide substantial reductions in 
repetitions of curve-fitting to the affected data segments. 
Further, the possibility of using missing points in selecting 
appropriate data segments for curve-fitting will be facilitated 
by early identification of missing points caused by elimination 
of outliers. This use will be discussed in a subsequent 
report . 
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APPENDIX A 



DATA FROM A TORPEDO PATH AT NUWES 

The model developed in this report was applied to 
data collected on a specific test in which a torpedo was 
launched against a submarine at the Naval Undersea Warfare 
Engineering Station. A major part of the torpedo path is 
sketched in the accompanying figure and the data is listed 
in the table which follows. Only the x and y coordinates 
are included. 
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